Nosana AI Inference with OpenAI SDK

Last updated: August 1 2025

Introduction

This tutorial shows how to use the OpenAI SDK to connect directly to AI models deployed on Nosana's GPU network. Nosana services expose OpenAI-compatible endpoints, making it easy to integrate with existing AI applications.

Want to learn how to actually spin up a job on Nosana and retrieve the base_url? Check out these resources:

Nosana Dashboard

Nosana CLI on GitHub

What You'll Learn

Connect OpenAI SDK to Nosana AI endpoints
Generate text with different parameters
Use streaming for real-time responses
Process multiple requests efficiently
Build practical AI workflows

Prerequisites

Python 3.8+
Basic understanding of AI APIs
A deployed Nosana AI service URL

Setup & Installation

# Un comment to install required packages
# !pip install openai requests pillow matplotlib python-dotenv -q

from openai import OpenAI
import requests
import base64
import json
from PIL import Image
import matplotlib.pyplot as plt
from io import BytesIO
import time
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

True

Connect to Nosana AI Service

The Nosana service URL is loaded from your .env file. Most Nosana deployments expose OpenAI-compatible APIs at /v1.

# Load Nosana service URL from environment variables
NOSANA_BASE_URL = os.getenv("NOSANA_BASE_URL")
MODEL_NAME = "DeepSeek-R1-Distill-Qwen-1.5B"  # Replace with your model name

if not NOSANA_BASE_URL:
    raise ValueError("NOSANA_BASE_URL not found in environment variables. Please check your .env file.")

# Initialize OpenAI client with Nosana endpoint
client = OpenAI(
    base_url=f"{NOSANA_BASE_URL}/v1",
    api_key="nosana-key"  # Many Nosana services don't require real API keys
)

print(f"🚀 Connected to Nosana AI service")
print(f"📍 Endpoint: {NOSANA_BASE_URL}/v1")
print(f"🤖 Model: {MODEL_NAME}")

🚀 Connected to Nosana AI service
📍 Endpoint: https://4oetidyuynh82uhbxwfmgmkyniw3fvyrz92eqtkwj6yb.node.k8s.prd.nos.ci//v1
🤖 Model: DeepSeek-R1-Distill-Qwen-1.5B

# Check available models (optional)
try:
    models = client.models.list()
    print("\n📋 Available models:")
    for model in models.data:
        print(f"  • {model.id}")
except Exception as e:
    print(f"\n⚠️ Could not list models: {e}")
    print("This is normal for some Nosana deployments.")

📋 Available models:
  • DeepSeek-R1-Distill-Qwen-1.5B

Basic Text Generation

# Simple text generation
def generate_text(prompt, max_tokens=300, temperature=0.7, stream=False):
    try:
        response = client.chat.completions.create(
            model=MODEL_NAME,
            messages=[
                {"role": "user", "content": prompt}
            ],
            max_tokens=max_tokens,
            temperature=temperature,
            stream=stream
        )
        
        if stream:
            return response  # Return generator for streaming
        else:
            return response.choices[0].message.content
    except Exception as e:
        return f"Error: {str(e)}"

# Test basic generation
test_prompt = "Explain what Nosana is in simple terms."
print(f"🧪 Testing: {test_prompt}")
print("\n📝 Response:")
response = generate_text(test_prompt)
print(response)

🧪 Testing: Explain what Nosana is in simple terms.

📝 Response:
Okay, so I need to explain what Nosana is in simple terms. Hmm, I'm not exactly sure what Nosana is, but I've heard the name before. Maybe it's a video game? I'll try to break it down.

First, I remember that Nosana has something to do with space exploration or maybe something related to space travel. It's probably a game where you play as a crew member. So maybe it's like a role-playing game where you're part of a space mission.

I think about the crew members. They would probably have to deal with things like spacewalks, repairs, and maybe even friendly or hostile alien creatures. That makes sense because in space, you have to be prepared for various situations.

I also recall that there's a lot of technology involved. Maybe it's like a spaceship that you pilot, but the spaceship is actually made up of different parts. Each part could represent different systems or systems within the spaceship. That sounds a bit complicated, but it's just a way to model the spaceship.

Another thing I remember is that the game uses a lot of special items or tech items. These might be things like communication devices, medical equipment, or maybe even something that helps with navigation. These items would be essential for the crew to survive and operate effectively.

I think about the players themselves. They would have to make choices that affect the spaceship's operations. For example, they might choose to repair a part of the ship, send a

Streaming Responses

# Streaming for real-time output
def stream_response(prompt, max_tokens=500, temperature=0.7):
    print(f"📝 Streaming response for: {prompt}")
    print("\n" + "=" * 60)
    
    try:
        stream = client.chat.completions.create(
            model=MODEL_NAME,
            messages=[{"role": "user", "content": prompt}],
            max_tokens=max_tokens,
            temperature=temperature,
            stream=True
        )
        
        full_response = ""
        for chunk in stream:
            if chunk.choices[0].delta.content:
                content = chunk.choices[0].delta.content
                full_response += content
                print(content, end="", flush=True)
        
        print("\n" + "=" * 60)
        print("✅ Streaming complete!\n")
        return full_response
        
    except Exception as e:
        print(f"\nError: {str(e)}")
        return None

# Example streaming
stream_prompt = "Write a detailed explanation of how blockchain technology works"
streamed_text = stream_response(stream_prompt, max_tokens=600, temperature=0.5)

📝 Streaming response for: Write a detailed explanation of how blockchain technology works

============================================================
Okay, so I need to explain how blockchain technology works in detail. I remember that blockchain is the technology used in Bitcoin, but I'm a bit fuzzy on the exact steps. Let me try to break it down.

First, I think blockchain is a distributed ledger, right? It's like a digital document that's secure and tamper-proof. But how does it work exactly? I remember something about nodes, which are computers or other blockchain systems. So, nodes exchange blocks of data, which are like records of transactions. But how does this exchange happen?

I think it's through something called a proof-of-work mechanism. That means each node has to do some computational work to validate the block. So, the more work you do, the more you get paid. That makes sense because it adds a layer of security since it's hard to cheat.

Wait, but how does the proof-of-work work in detail? I think it involves creating a hash, which is like a digital fingerprint of the data. Then, the hash has to be below a certain threshold to be considered valid. So, the node has to find a hash that meets this condition, which requires some computational effort.

Once a node validates a block, it gets rewarded with cryptocurrency. But how does that reward work? I think it's based on the amount of work the node did. So, the more work, the more rewards. But I'm not sure how this incentivizes honest nodes over malicious ones.

Then, when a block is added to the main chain, it's called a consensus mechanism. I think this is where nodes agree on the order of blocks. If a node adds a block that's inconsistent with the main chain, it's considered malicious. But how does this happen? Maybe through a proof-of-work that's harder than the main chain.

Once the main chain is agreed upon, the network can validate transactions. This is where the consensus mechanism comes into play again. If a node validates a transaction against the main chain, it's considered honest. But if it's inconsistent, it's considered malicious.

Once a transaction is validated, it's added to the blockchain. But how does this happen in real-time? I think it's through a process called proof-of-work, where the node has to solve a complex problem to add the transaction to the chain. This adds a layer of security because it's computationally intensive and hard to cheat.

I'm a bit confused about the proof-of-work process. How exactly does the node find a hash that's below the target? I think it's a process where the node generates a random input, computes the hash, and if it's below the target, it's considered valid. But how does this balance between honest and malicious nodes? Maybe the honest nodes do more work, so they get more rewards, while malicious nodes do less.

Also, how does the network reach consensus? I think it's through a process called proof-of-stake, where
============================================================
✅ Streaming complete!

Conversation Context & Memory

# Multi-turn conversation with context
def have_conversation(messages, new_message, max_tokens=300, temperature=0.7):
    # Add new message to conversation
    messages.append({"role": "user", "content": new_message})
    
    try:
        response = client.chat.completions.create(
            model=MODEL_NAME,
            messages=messages,
            max_tokens=max_tokens,
            temperature=temperature
        )
        
        assistant_message = response.choices[0].message.content
        messages.append({"role": "assistant", "content": assistant_message})
        
        return assistant_message, messages
    except Exception as e:
        return f"Error: {str(e)}", messages

# Start a conversation about AI
conversation = []

print("💬 Multi-turn Conversation Example:\n")

# Turn 1
response1, conversation = have_conversation(
    conversation, 
    "Hi! Can you explain what machine learning is?"
)
print("👤 User: Hi! Can you explain what machine learning is?")
print(f"🤖 Assistant: {response1}\n")

# Turn 2
response2, conversation = have_conversation(
    conversation, 
    "That's interesting! Can you give me a real-world example?"
)
print("👤 User: That's interesting! Can you give me a real-world example?")
print(f"🤖 Assistant: {response2}\n")

# Turn 3
response3, conversation = have_conversation(
    conversation, 
    "How does that relate to what you explained earlier?"
)
print("👤 User: How does that relate to what you explained earlier?")
print(f"🤖 Assistant: {response3}\n")

print(f"📊 Conversation length: {len(conversation)} messages")

💬 Multi-turn Conversation Example:

👤 User: Hi! Can you explain what machine learning is?
🤖 Assistant: Okay, so I need to explain what machine learning is. Hmm, where do I start? I've heard about it a lot, but I'm not exactly sure what it really is. Let me think. I know it has something to do with computers learning from data, but I'm not entirely clear on the specifics.

Maybe I should break it down. From what I remember, machine learning is a type of artificial intelligence, right? So AI is about machines doing tasks like speech recognition or computer vision, but I think machine learning is a subset of that. But how exactly does it work? Is it about algorithms?

I think there are different types of machine learning, like supervised, unsupervised, and reinforcement learning. Wait, what's the difference between them? I remember supervised learning involves labeled data, so the model knows the correct answers and learns from them. Unsupervised is when there's no labeled data, and it finds patterns on its own. And reinforcement learning is where the model learns by interacting with an environment and receiving rewards or penalties. That makes sense because I've seen games like AlphaGo where AI learns through trial and error.

But how do these algorithms actually work? Like, what's the process of training a model? I think it involves feeding the data into the model, which adjusts its internal parameters to minimize errors. There must be some kind of optimization process, maybe gradient descent, to find the best parameters.

I also remember something about feature engineering. That's

👤 User: That's interesting! Can you give me a real-world example?
🤖 Assistant: Okay, so the user just asked for a real-world example of machine learning. I need to provide a clear and relatable example that someone without a technical background can understand.

First, I should connect machine learning to something they know, like everyday technology. Maybe something like recommendation systems or image processing. These are areas where people have used AI in their daily lives.

I should explain how machine learning works in a simple way. Maybe start with a scenario they can visualize, like a recommendation engine for a streaming service. The user might not be familiar with the technical terms, so I should avoid jargon as much as possible.

Breaking it down into steps makes it easier to follow. I'll outline how the system collects data, processes it, trains the model, and then applies it. Ending with how it benefits society shows the practical impact.

I should also highlight the benefits of machine learning, like personalized experiences and efficiency, to make it seem relevant and useful. Keeping the language simple and conversational will help the user grasp the concept without feeling overwhelmed.
</think>

Sure! Let’s break it down with a simple, real-world example.

Imagine you're using an app that tells you what your friends might like next on your phone. This is a recommendation system. Machine learning is the technology behind it. Here's how it works:

1. **Collect Data**: The app collects data about what your friends have liked in the past. This could be based on their viewing history, genre preferences, or even

👤 User: How does that relate to what you explained earlier?
🤖 Assistant: Okay, so I'm trying to understand how machine learning relates to the recommendation system example I just thought of. Let me go through it step by step.

First, I know that machine learning is a subset of AI that allows computers to learn from data. It's about developing algorithms that can improve their performance as they are exposed to more data. The key here is that the models learn without being explicitly programmed.

In the recommendation system, the app uses this learning to predict what a friend might like. So, the data collected is about the user's viewing history and preferences. This data is like the training data that the machine learning model uses to make predictions.

Now, how does the model actually learn? I remember that there's something called feature engineering, where the data is transformed into features that the model can use. In this case, features might include things like the time of day, mood, or the genre of the content the user has been watching.

Then there's the training process, which involves feeding this data into the machine learning algorithm. The algorithm adjusts its internal parameters to minimize errors, which are differences between the predicted and actual recommendations. This adjustment is where the optimization happens, maybe using methods like gradient descent.

I also remember the concept of overfitting, where the model might perform well on the training data but poorly on new, unseen data. Regularization is a technique used to prevent this by adding constraints to the model, making it more generalizable.

So, putting it all

📊 Conversation length: 6 messages

Batch Processing

# Process multiple texts efficiently
def batch_process(texts, task_instruction, max_tokens=200, temperature=0.3):
    results = []
    
    for i, text in enumerate(texts, 1):
        print(f"📄 Processing {i}/{len(texts)}: {text[:50]}...")
        
        prompt = f"{task_instruction}\n\nText: {text}"
        
        try:
            response = client.chat.completions.create(
                model=MODEL_NAME,
                messages=[{"role": "user", "content": prompt}],
                max_tokens=max_tokens,
                temperature=temperature
            )
            
            result = response.choices[0].message.content
            results.append({
                "index": i,
                "original": text,
                "processed": result
            })
            
        except Exception as e:
            results.append({
                "index": i,
                "original": text,
                "error": str(e)
            })
    
    return results

# Example: Summarize multiple articles
articles = [
    "Nosana is building a decentralized GPU network that makes AI compute more accessible and affordable. The platform allows anyone to rent GPU power from a distributed network of providers, reducing costs and increasing availability for AI developers.",
    
    "Blockchain technology has evolved beyond cryptocurrencies to enable new forms of decentralized computing. Smart contracts can automatically manage resource allocation, payments, and service level agreements in distributed networks.",
    
    "The demand for GPU computing has exploded with the rise of AI applications. Traditional cloud providers often have limited availability and high costs, creating opportunities for decentralized alternatives that can utilize idle hardware from various sources.",
    
    "Decentralized infrastructure offers several advantages including censorship resistance, reduced single points of failure, and more competitive pricing through open markets. These benefits are particularly valuable for AI workloads that require significant computational resources."
]

task = "Summarize this text in one clear sentence:"

print(f"🔄 Batch Processing: Summarizing {len(articles)} articles\n")
summaries = batch_process(articles, task, max_tokens=100, temperature=0.2)

print("\n📊 Results:")
print("=" * 80)
for summary in summaries:
    if "error" in summary:
        print(f"\n{summary['index']}. Error: {summary['error']}")
    else:
        print(f"\n{summary['index']}. Original: {summary['original'][:80]}...")
        print(f"    Summary: {summary['processed']}")

🔄 Batch Processing: Summarizing 4 articles

📄 Processing 1/4: Nosana is building a decentralized GPU network tha...
📄 Processing 2/4: Blockchain technology has evolved beyond cryptocur...
📄 Processing 3/4: The demand for GPU computing has exploded with the...
📄 Processing 4/4: Decentralized infrastructure offers several advant...

📊 Results:
================================================================================

1. Original: Nosana is building a decentralized GPU network that makes AI compute more access...
    Summary: Okay, so I need to summarize the given text into one clear sentence. Let me read through the text again to make sure I understand it properly.

The text says, "Nosana is building a decentralized GPU network that makes AI compute more accessible and affordable. The platform allows anyone to rent GPU power from a distributed network of providers, reducing costs and increasing availability for AI developers."

Alright, so the main points are:

1. Nosana is creating a decentralized GPU network.
2. This network aims

2. Original: Blockchain technology has evolved beyond cryptocurrencies to enable new forms of...
    Summary: Alright, so I need to summarize this text into one clear sentence. Let me read through it again to make sure I understand what it's saying.

The text mentions that blockchain technology has evolved beyond just cryptocurrencies to enable new forms of decentralized computing. It specifically talks about smart contracts that can manage resource allocation, payments, and service level agreements in distributed networks.

Hmm, so the main points are: blockchain technology has expanded beyond crypto to other uses, smart contracts are a key part of this, and they

3. Original: The demand for GPU computing has exploded with the rise of AI applications. Trad...
    Summary: Okay, so I need to summarize this text into one clear sentence. Let me read through it again to make sure I understand it properly.

The text says that the demand for GPU computing has exploded because of the rise in AI applications. Traditional cloud providers usually don't have enough hardware or are expensive, so they're looking for alternatives. These alternatives can use unused hardware from various sources to make computing more efficient and cost-effective.

Hmm, so the main points are: GPU computing, AI applications, explosion

4. Original: Decentralized infrastructure offers several advantages including censorship resi...
    Summary: Okay, so I need to summarize this text into one clear sentence. Let me read through it again to make sure I understand all the key points.

The text says that decentralized infrastructure offers several advantages, specifically censorship resistance, reduced single points of failure, and competitive pricing through open markets. These benefits are especially important for AI workloads that need a lot of computational resources.

Hmm, so the main points are: decentralization makes things more resistant to censorship, less vulnerable to single points of failure, and

Best Practices & Tips

Optimizing Requests

Temperature Settings: Use 0.1-0.3 for factual tasks, 0.7-0.9 for creative tasks
Token Limits: Set appropriate max_tokens to avoid unnecessary costs
Batch Processing: Group similar requests to maximize efficiency
Caching: Store responses for repeated queries

Performance Tips

Use streaming for long responses to improve perceived speed
Monitor response times and adjust timeout settings
Consider prompt length impact on processing time
Test different model configurations for your use case

Summary

You've learned how to:

✅ Connect OpenAI SDK to Nosana endpoints for seamless integration
✅ Generate text with various parameters for different use cases
✅ Use streaming responses for better user experience
✅ Maintain conversation context across multiple turns
✅ Process batches efficiently for high-volume applications
✅ Build complete workflows for real-world applications

🚀 You're now ready to build powerful AI applications using Nosana's GPU network with the familiar OpenAI SDK!